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Scientists are frequently faced with the important decision to start or terminate a creative partnership. This 
process can be influenced by strategic motivations, as early career researchers are pursuers, whereas senior 
researchers are typically attractors, of new collaborative opportunities. Focusing on the longitudinal aspects of 
scientific collaboration, we analyzed 473 collaboration profiles using an ego-centric perspective which accounts 
for researcher-specific characteristics and provides insight into a range of topics, from career achievement and 
sustainability to team dynamics and efficiency. From more than 166,000 collaboration records, we quantify 
the frequency distributions of collaboration duration and tie-strength, showing that collaboration networks are 
dominated by weak ties characterized by high turnover rates. We use analytic extreme-value thresholds to 
identify a new class of indispensable ‘super ties’, the strongest of which commonly exhibit > 50% publication 
overlap with the central scientist. The prevalence of super ties suggests that they arise from career strategies 
based upon cost, risk, and reward sharing and complementary skill matching. We then use a combination of 
descriptive and panel regression methods to compare the subset of publications coauthored with a super tie to 
the subset without one, controlling for pertinent features such as career age, prestige, team size, and prior group 
experience. We find that super ties contribute to above-average productivity and a 17% citation increase per 
publication, thus identifying these partnerships - the analog of life partners - as a major factor in science career 
development. 


A scientist will encounter many potential collabora¬ 
tors throughout the career. As such, the choice to 
start or terminate a collaboration can be an impor¬ 
tant strategic consideration with long-term implica¬ 
tions. While previous studies have focused primarily 
on aggregate cross-sectional collaboration patterns, 
here we analyze the collaboration network from a re¬ 
searcher’s local perspective along his/her career. Our 
longitudinal approach reveals that scientific collabo¬ 
ration is characterized by a high turnover rate juxta¬ 
posed with surprisingly frequent ‘life partners’. We 
show that these extremely strong collaborations have 
a significant positive impact on productivity and ci¬ 
tations - the apostle effect - representing the advan¬ 
tage of ‘super’ social ties characterized by trust, con¬ 
viction, and commitment. For the Supporting Informa¬ 
tion see the published version; A. M. Petersen (2015) 
Proc. Nat. Acad. Sci. USA 112, E4671-E4680. 
DOI; 10.1073/pnas. 1501444112 


Science operates at multiple scales, ranging from the global 
and institutional scale down to the level of groups and in¬ 
dividuals [1]. Integrating this system are multi-scale social 
networks that are ripe with structural, social, economic, and 
behavioral complexity [2]. A subset of this multiplex is the 
scientific collaboration network, which forms the structural 
foundation for social capital investment, knowledge diffusion, 
reputation signaling, and important mentoring relations [3-8]. 

Here we focus on collaborative endeavors that result in sci¬ 
entific publication, a process which draws on various aspects 
of social ties, e.g. colocation, disciplinary identity, competi¬ 
tion, mentoring, and knowledge flow [9]. The dichotomy be¬ 
tween strong and weak ties is a longstanding point of research 
[10]. However, in ‘science of science’ research, most stud¬ 
ies have analyzed macroscopic collaboration networks aggre¬ 


gated across time, discipline, and individuals [11-21]. Hence, 
despite these significant efforts, we know little about how 
properties of the local social network affect scientists’ strate¬ 
gic career decisions. For example, how might creative op¬ 
portunities in the local collaboration network impact a re¬ 
searcher’s decision to explore new avenues versus exploit¬ 
ing old partnerships, and what may be the career tradeoffs 
in the short versus the long-term, especially considering that 
academia is driven by dynamic knowledge frontiers [22, 23]. 

Against this background, we develop a quantitative ap¬ 
proach for improving our understanding of the role of weak 
and strong ties, meanwhile uncovering a third classification - 
the ‘super tie’ - which we find to occur rather frequently. We 
analyzed longitudinal career data for researchers from cell bi¬ 
ology and physics, together comprising a set of 473 researcher 
profiles spanning more than 15,000 career years, 94,000 pub¬ 
lications, and 166,000 collaborators. In order to account for 
prestige effects, we define 2 groups within each discipline set, 
facilitating a comparison of top-cited scientists with scientists 
that are more representative of the entire researcher population 
(henceforth referred to as “other”). From the Ni publication 
records spanning the first Ti career years of each central sci¬ 
entists i, we constructed longitudinal representations of each 
scientist’s coauthorship history. 

We adopt an ego-centric perspective in order to track re¬ 
search careers from their inception along their longitudinal 
growth trajectory. By using a local perspective we control for 
the heterogeneity in collaboration patterns that exists both be¬ 
tween and within disciplines. We also control for other career- 
specific collaboration and productivity differences that would 
otherwise be averaged out by aggregate cross-sectional meth¬ 
ods. Thus, by simultaneously leveraging multiple features of 
the data - resolved over the dimensions of time, individuals, 
productivity, and citation impact - our analysis contributes to 
the literature on science careers as well as team activities char¬ 
acterized by dynamic entry and exit of human, social, and ere- 
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ative capital. Given that collaborations in business, industry, 
and academia are increasingly operationalized via team struc¬ 
tures, our findings provide relevant quantitative insights into 
the mechanisms of team formation [15], efficiency [24], and 
performance [25, 26]. 

The organization of our study is structured as follows. The 
longitudinal nature of a career requires that we start by quan¬ 
tifying the tie-strength between two collaborators from two 
different perspectives: duration and strength. First we analyze 
the collaboration duration, Ly, defined as the time period be¬ 
tween the first and last publication between two researchers 
i and j. Our results indicate that the “invisible college” de¬ 
fined by collaborative research activities (i.e. excluding infor¬ 
mal communication channels and arm’s length associations) is 
surprisingly dominated by high-frequency interactions lasting 
only a few years. We then focus our analysis on the collab¬ 
orative ‘tie strength’, Kij, defined as the cumulative number 
of publications coauthored by i and j during the Ly years of 
activity. 

From the entire set of collaborators, we then identify a sub¬ 
set of ‘super tie’ coauthors - those j with values that are 
statistically unlikely according to an author-specific extreme- 
value criteria. Because almost all of the researchers we ana¬ 
lyzed have more than one super tie, and roughly half of the 
publications we analyzed include at least one super-tie coau¬ 
thor, we were able to quantify the added value of super ties 
- for both for productivity and citation impact - in two ways, 
(i) using descriptive measures and (ii) implementing a fixed- 
effects regression model. Controlling for author-specific fea¬ 
tures, we find that super ties are associated with increased 
publication rates and increased citation rates. 

We term this finding the ‘apostle effect’, signifying the 
dividends generated by extreme social ties based upon mu¬ 
tual trust, conviction, and commitment. This term borrows 
from biblical context, where an apostle represents a distin¬ 
guished partner selected according to his/her noteworthy at¬ 
tributes from among a large pool of candidates. What we do 
not connote is any particular power relation (hierarchy) be¬ 
tween i and the super tie coauthors, which is beyond the scope 
of this study. Also, because the perspective is centered around 
i, our super-tie definition is not symmetric, i.e. if j is a super 
tie of i, i is not necessarily a super tie of j. 

Because super ties have significant long-term impact on 
productivity and citations, our results are important from a 
career development perspective, reflecting the strategic ben¬ 
efits of cost, risk, and reward-sharing via long-term partner¬ 
ship. The implications of research partnerships will become 
increasingly relevant as more careers become inextricably em¬ 
bedded in team science environments, wherein it can be dif¬ 
ficult to identify contributions, signal achievement, and dis¬ 
tribute credit. The credit distribution problem has received 
recent attention from the perspectives of institutional policy 
[8], team ethics [7], and practical implementation [27-29]. 


Results 

Defining the ego collaboration network. Our framework as¬ 
sumes the perspective of the central scientist i in the ego net¬ 
work formed by all of his/her collaborators (indexed by j). 
We use longitudinal publication data from Thompson Reuters 
Web of Knowledge (TRWOK), comprising 193 biology and 
280 physics careers. Each career profile is constructed by ag¬ 
gregating the collaboration metadata over the first / = 1... 
years of his/her career. We downloaded the TRWOK data in 
calendar year Yi, which is the citation count census year. Each 
disciplinary set includes a subset of 100 highly-cited scien¬ 
tists (hereafter referred to as “top”), selected using a rank¬ 
ing of the top-cited researchers in the high-impact journals 
Physical Review Letters and Cell. The rest of the researcher 
profiles (“other”) are aggregated across physics and cell biol¬ 
ogy, with subsets that are specifically active in the domains 
of graphene, neuroscience, molecular biology, and genomics. 
The “other” dataset only includes i with at least as many publi¬ 
cations as the smallest Ni among the top-cited researchers: as 
such, Ni > 52 for biology and Ni > 46 for physics. This fa¬ 
cilitates a reasonable comparison between “top” and “other”, 
possibly identifying differences attributable to innate success 
factors. See the Supporting Information Text {SI Text) for fur¬ 
ther details on the data selection. 

This longitudinal approach leverages author-specific fac¬ 
tors, revealing how career paths are affected by idiosyncratic 
events. To motivate this point, Eig. 1 illustrates the career 
trajectory of A. Geim, co-winner of the 2010 Nobel Prize in 
Physics. This schematic highlights three fundamental dimen¬ 
sions of collaboration ties - duration, strength, and impact: 

(a) each horizontal line indicates the collaboration of 

length Lij = — tij + 1 between i and coauthor j, 

beginning with their first joint publication in year 
and ending with their last observed joint publication in 
year/f^-; 

(b) the circle color indicates the total number of joint pub¬ 
lications, Kij, representing our quantitative measure of 
‘tie strength’; 

(c) the circle size indicates the net citations Cij = Cj p 
in Yi, summed over all publications p that include i and 

j- 

Pigs. SI and S2 in the SI Text further illustrate the variability 
in collaboration strengths, both between and within career 
profiles. It is also worth mentioning that since multiple j may 
contribute to the same p, it is possible for coauthor measures 
to covary. However, for the remainder of the analysis we 
focus on the dyadic relations between only i and j, leaving 
the triadic and higher-order ‘team’ structures as an avenue for 
future work. Por example, it would be interesting to know 
the likelihood of triadic closure between any two super ties 
of i, signaling coordinated cooperation; or contrariwise, low 
triadic closure rates may indicate hierarchical organization 
around i. 
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FIG. 1: Visualizing the embedding of academic careers in dy¬ 
namic social networks. A career schematic showing A. Geim’s col¬ 
laborations, ordered by entry year. Notable career events include the 
first publication in 2000 with K. S. Novoselov (co-winner of the 2010 
Nobel Prize in Physics) and their first graphene publication in 2004. 
An interesting network reorganization accompanies Geim’s institu¬ 
tional move from Radboud University Nijmegen (NL) to U. Manch¬ 
ester (UK) in 2001. Moreover, the rapid accumulation of coauthors 
following the 2004 graphene discovery signals the new opportunities 
that accompany reputation growth. 


Quantifying the collaboration lifetime distribution. We use 

Lij to measure the duration of the productive interaction be¬ 
tween i and j. We find that a remarkable 60 to 80 percent of 
the collaborations have Lij = 1 year (see SI Text Fig. S4). 
Considering the overwhelming dominance of the Lij = 1 
events, in this subsection we concentrate our analysis on the 
subset of repeat collaborations (L,y > 1) which produced two 
or more publications. Furthermore, due to censoring bias, L^ 


values estimated for j who are active around the final career 
year of the data [Ti) may be biased towards small values. To 
account for this bias, in this subsection we also exclude those 
collaborations that were active within the final L°-year period, 
defining L^ as an initial average Lij value calculated across all 
j for each i. Then, we calculate a second representative mean 
value, {Li), which is calculated excluding the j with Ly = 1 
and the j active in the final L^-year period. Figure 2(A) shows 
the probability distribution P{{Li)), with mean values rang¬ 
ing from 4 to 6 years, consistent with the typical duration of 
an early career position (e.g. PhD or postdoctoral fellow, as¬ 
sistant professor). 

Establishing statistical regularities across research profiles 
requires the use of a normalized duration measure, = 
Lij/{Li), which controls for author-specific collaboration 
patterns by measuring time in units of {L/). The empirical 
distributions are right-skewed, with approximately 63% of the 
data with < {L/) (corresponding to < 1). Never¬ 
theless, approximately 1% of collaborations last longer than 
4(Li) « 15 to 20 years. Moreover, Fig. 2(A) shows that the 
log-logistic probability density function (pdf) 


(b/«)(A/a)^-i 

(1 + ( A / a )'>)2 


( 1 ) 


provides a good fit to the empirical data over the entire range 
of Ay . The log-logistic (Fisk) pdf is a well-known survival 
analysis distribution with property Median(A) = a. By con¬ 
struction, the mean value (A) = 1, which reduces our param¬ 
eter space to just b as a = sm{TT/b)/{ tt/ b). For each dataset 
we calculate b > 2.6, estimating the parameter using ordinary 
least-squares. Associated with each P(A) is a hazard function 
representing the likelihood that a collaboration terminates for 
a given Ay . Since b > 1, the hazard function is unimodal, 
with a maximum value occurring at Ac = a{b — 1)^/^ with 
bounds Ac > a for 6 > 2 and Ac > 1 for 6 > 2.83...; using 
the best-fit a and b values we estimate Ac « 0.94 (top biol¬ 
ogy), 1.11 (other biology), 0.77 (top physics), and 1.08 (other 
physics). Thus, Ac represents a tipping point in the sustain¬ 
ability of a collaboration, because the likelihood that a collab¬ 
oration terminates peaks at Ac and then decreases monoton- 
ically for A^^ > Ac. This observation lends further signifi¬ 
cance to the author-specific time scale {L/}. The log-logistic 
pdf is also characterized by asymptotic power-law behavior 
P(A) ^ for large Ay. 

In order to determine how the Aij values are distributed 
across the career, we calculated the mean duration (A|f) 
using a 5-year (sliding window) moving average centered 
around career age t. If the Ay values were distributed 
independent of t, then (A|f) « 1. Instead, Figure 2(B) 
shows a negative trend for each dataset. Interestingly, the 
(A|t) values are consistently larger for the top scientists, 
indicating that the relatively short Ly are more concentrated 
at larger t. This pattern of increasing access to short-term 
collaboration opportunities points to an additional positive 
feedback mechanism contributing to cumulative advantage 
[30,31]. 
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FIG. 2: Log-logistic distribution of collaboration duration. (A) The probability distribution P(A) is right-skewed and well-fit by the 
log-logistic pdf defined in Eq. [1], (Insets) The probability distribution P{{Li}) show that the characteristic collaboration length in physics 
and biology is typically between 2 and 6 years. (B) The decrease in the typical collaboration timescale, (A|f), reflects how careers transition 
from being pursuers of collaboration opportunities to attractors of collaboration opportunities. 


Quantifying the collaboration life cycle. The P(A) distri¬ 
bution points to the variability of time scales in the scientific 
collaboration network - while a small number of collabora¬ 
tions last a lifetime, the remainder decay quite quickly in a 
collaboration environment characterized by a remarkably high 
churn rate. Since it is possible that a relatively long Ly corre¬ 
sponds to just the minimum 2 publications, it is also important 
to analyze the collaboration rate. To this end, we quantify the 
patterns of growth and decay in tie strength using the more 
than 166,000 dyadic (ij) collaboration records: Kij{t) is the 
cumulative number of coauthored publications between i and 
j up to year t, and AKij{t) = Kij{t) — Kij{t — 1) is the 
annual publication rate. 

In order to define a collaboration trajectory that is better 
suited for averaging, we normalize each individual AKij (t) 
by its peak value, 

AKi^ (r) = AK,, (T)/Max[Aif,, (r)] . (2) 

Here t = Tij = t — -F 1 is the number of years since the 
initiation of a given collaboration. This normalization proce¬ 
dure is useful for comparing and averaging time series’ that 


are characterized by just a single peak. 

Expecting that the collaboration trajectories depend on the 
tie strength, we grouped the individual AK{^{t) according to 
the normalized coauthor strength, Xij = Kij/ {Ki). The nor- 
malization factor {Ki) = S~ calculated across 

the Si distinct collaborators (the collaboration radius of i), 
and represents an intrinsic collaboration scale which grows in 
proportion to both an author’s typical collaboration size and 
his/her publication rate. We then aggregated the trajec¬ 
tories in each {a:} group and calculated the average trajectory 

{AK[^ (r|x)) = A ^-4 ^ AK[^ {t\x). (3) 

{x} 

Indeed, Fig. 3 shows that the collaboration ‘life cycle’ 
AKij (r|x) depends strongly on the relative tie strength Xij = 
Kij/{Ki). The trajectories with Xij > 12.0 decay over a 
relatively long timescale, maintaining a value approximately 
0.2 Max[AiTiy(r)] even 20 years after initiation, reminis¬ 
cent of a ‘research life partner’. The trajectories with Xy G 
[0.9,1.4] represent common collaborations that decay expo¬ 
nentially over the characteristic time-scale {Li). A mathe- 
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FIG. 3: Growth and decay of collaboration ties. (A,B) Average collaboration intensity, normalized to peak value, measured Tij years 
after the initiation of the collaboration tie. (Insets) On log-linear axes the decay appears as linear, corresponding to an exponential form. 
(C,D) For each {a;} group we show the average and standard deviation (error bar) of T 1 / 2 ', we use logarithmically spaced {a;} groups that 
correspond by color to the same {a:} as in panels (A,B). The value quantifies the scaling of {ri/ 2 } as a function of the normalized coauthor 
strength Xij = Kijj{Ki). The sub linear (( < 1) values indicate that collaborations are distributed over a timescale that grows slower than 
proportional to a;; conversely, this means that longer collaborations are relatively more productive, being characterized by increasing marginal 
returns (1/^ > 1). SI Appendix Fig. S3 shows the analogous plot for the other physics and biology datasets; all 4 datasets exhibit similar 
features. 


matical side note, useful as a modeling benchmark, is the lin¬ 
ear decay when plotted on log-linear axes, suggesting a func¬ 
tional form that is exponential for large r, (AiT'^ (r|a;)) ~ 
exp[— t/t]. 

We further emphasize the ramifications of the life-cycle 
variation by quantifying the relation between Xij and the 
collaboration’s half-life T 1 / 2 , defined as the number of years 
to reach half of the total collaborative output according to 
the relation Kij{t — T 1 / 2 ) — Kij/2. We observe a scaling 
relation (ri/ 2 ) with C values ranging from 0.4 to 

0.5. Sublinear values (C < 1) indicate that a collaboration 
with twice the strength is likely to have a corresponding 
Ti /2 that is less-than doubled. This feature captures the 
burstiness of collaborative activities, which likely arises 
from the heterogenous overlapping of multiple timescales, 
e.g. the variable contract lengths in science ranging from 
single-year contracts to lifetime tenure, the overlapping of 
multiple age cohorts, and the projects and grants themselves 
which are typically characterized by relatively short terms. 
Nevertheless, dxldTi /2 ~ is increasing function 

for C < 1, indicating an increasing marginal returns with 
increasing T 1 / 2 . further signaling the productivity benefits of 
long-term collaborations characterized by formalized roles, 
mutual trust, experience, and group learning that together 
facilitate efficient interactions. 


Quantifying the tie-strength distribution. Here we focus on 
the cross-sectional distribution of tie strengths within the ego 
network. We use the final tie strength value Kij to distinguish 
the strong ties (iTij > (iTi)) from the weak ties (ATij < {Ki)). 
Figure 4(A) shows the cumulative distribution P{< {Ki)) of 
the mean tie strength {Ki), which can vary over a wide range 
depending on a researcher’s involvement in large team science 
activities. We also quantify the concentration of tie strength 
using the Gini index Gi calculated from each researcher’s Kij 
values; the distribution P{< Gi) is shown in Fig. 4(B). To¬ 
gether, these two measures capture the variability in collab¬ 
oration strengths across and within discipline, with physics 
exhibiting larger {Ki) and Gi values. 

Another important author-specific variable is the publica¬ 
tion overlap between each researcher and his/her top col¬ 
laborator. This measure is defined as the fraction of a re¬ 
searcher’s Ni publications including his/her top collaborator, 
fK,i = MaXj[Kij]/Ni. We observe surprisingly large varia¬ 
tion in fK,i, with mean and standard deviation in the range of 
0.16 ± 0.14 for the top scientists and 0.36 ± 0.23 for the other 
scientists. Across all profiles, the min and max ffc i values are 
0.03 and 0.99, respectively, representing nearly the maximum 
possible variation in observed publication overlap. An exam¬ 
ple of this limiting scenario is shown in Fig. S2, highlight¬ 
ing the “dynamic duo” of J. L. Goldstein and M. S. Brown, 
winners of the 1985 Nobel Prize in Physiology or Medicine; 
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FIG. 4: Characteristic measures of collaboration tie strength. 

(A) Cumulative distribution of the mean collaboration strength, 
{Ki). The Kolmogorov-Smimov (K-S) test indicates that the 
P{{Ki)) are similar for biology (p = 0.031) and significantly dif¬ 
ferent for physics (p — 0.004). Vertical lines indicate median 
value. (B) Cumulative distribution of Gi. The pairwise K-S test 
indicates that the P{Gi) are similar for biology (p = 0.14) but not 
for physics (p = 0.02). Vertical lines indicate the mean value, with 
physics indicating significantly higher Gi than for biology. (C,D) For 
each dataset, the cumulative distribution of normalized collaboration 
strength Xij shows excellent agreement with the exponential distri¬ 
bution E[x) = exp [— 2 :] (gray line) over the bulk of the distribution, 
with the deviations in the tail regime representing less than 0.1% of 
the data. 


Goldstein and Brown published more than 450 publications 
each, with roughly 100 x fK,i ~ 95% coauthored together. 
Remarkably, we hnd that overlaps larger than 50% are not un¬ 
common, observing 100P(/if > 0.5) « 9% (biology) and 
100P(/ir > 0.5) « 20% (physics) of i having more than half 
of their publications with their strongest collaborator. 

However, within a researcher prohle, it is likely that more 
than just the top collaborator was central to his/her career. 
Indeed, key to our investigation is the identihcation of the 
extremely strong collaborators - super ties - that are distin¬ 
guished within the subset of strong ties. Hence, using the 
empirical information contained within each researcher’s tie- 
strength distribution, P{Kij), we develop an objective super¬ 
tie criteria that is author-specihc. First, in order to gain a bet¬ 
ter understanding of the statistical distribution of Kij, we ag¬ 
gregated the tie-strength data across all research prohles, us¬ 
ing the normalized collaboration strength Xij. Figures 4(C,D) 
show the cumulative distribution P{> x) for each discipline. 
Each P(> x) is in good agreement with the exponential dis¬ 
tribution exp[—x] (with mean value (x) = 1 by construction), 
with the exception in the tail, P{> x) < 10“^, which is home 
to extreme collaborator outliers. Thus, by a second means in 
addition to the result for L^ , we hnd that roughly 2/3 of the 
ties we analyzed are weak (i.e. the fraction of observations 
with Xij < 1 is given by 1 — 1/e « 0.63). 

Based upon this empirical evidence, we use the discrete 
exponential distribution as our baseline model, P{Kij) <x. 
exp{—KiKij). We then use extreme statistics arguments to 
precisely dehne the author-specihc super-tie threshold Kf. 


The extreme statistic criteria posits that out of the Si em¬ 
pirical observations there should be just a single observation 
with Kij > Kf. The threshold Kf is operationalized by in¬ 
tegrating the tail of P{Kij) according to the equation 1/5'^ = 
YKk- >K‘^ ) = exp(—KiiV/), with the analytic relation 

{Ki)= KijPiK^K = e«V(e«- - 1) « 1 + l//t, for 

small Ki- In the relatively large Si limit, Kf is given by the 
simple relation 


Kf = {{K,)-l)\nS,. (4) 

The advantage of this approach is that Kf is nonparametric, 
depending only on the observables {Kf) and Si. Thus, the 
super-tie threshold is proportional to (Kf) — 1 (the —1 arises 
because the minimum Kij value is 1 ), with a logarithmically 
factor In Si reflecting the sample size dependence. This ex¬ 
treme value criteria is generic, and can be derived for any data 
following a baseline distribution; for a succinct explanation of 
this analytic method see page 17 of ref. [32]. 

In what follows, we label each coauthor j with Kij >Kf 
a super tie, with indicator variable Rj = 1. The rest of the 
ties with Kij < Kf have an indicator variable Rj = 0. 
This method has limitations, specifically in the case that the 
collaboration profile does not follow an exponential P{Kij). 
For example, consider the extreme case where every Kij = 1, 
meaning that Kf = 0 (independent of Sf), resulting in all 
coauthors being super ties {Rj = 1 for all j). This scenario is 
rare and unlikely to occur for researchers with relatively large 
Ni and Si, as in our researcher sample. 

Quantifying the prevalence and impact of super ties. How 

common are super ties? For each profile we denote the num¬ 
ber of coauthors that are super ties by Sr^ (with complement 
S\R,i = Si — SRj). SI Text Fig. S4 shows that the distribution 
of Sr^ is rather broad, with mean and standard deviation Sr^ 
values: 18 ± 13 (top bio.), 16 ± 13 (other bio.), 7.3 ± 4.8 (top 
phys.), 6.8 ± 5.1 (other phys.). The super-tie coauthor frac¬ 
tion, fRj = SRj/Si, measures the super-tie frequency on a 
per-collaborator basis, with mean value {Jr) « 0.04 (i.e. typ¬ 
ically 1 super tie for every 25 coauthors). Furthermore, Fig. 
5(A) shows that the distribution P{< Jr) is common across 
the four datasets. We tested the universality of the probabil¬ 
ity distribution P{fR) between the top and other researcher 
datasets using the Kolmogorov-Smirnov (K-S) statistic, which 
tests the null hypothesis that the data come from the same un¬ 
derlying pdf. The smallest pairwise K-S test p-value between 
any two P{fR) is p = 0 . 21 , indicating that we fail to reject 
the null hypothesis that the distributions are equal, highlight¬ 
ing that the four datasets are remarkably well-matched with 
respect to the distribution of /r^. 

On a per paper basis, Fig. 5(B) shows that the fraction of 
a researcher’s portfolio coauthored with at least one super tie, 
can vary over the entire range of possibilities, with mean 
and standard deviation 0.50 ± 0.18 (top bio.), 0.74 ± 0.13 
(otherbio.), 0.42±0.19 (topphys.), 0.58±0.23 (otherphys.). 
Furthermore, we found that 41% of the top scientists have 
fN,i fl 0.5. Interestingly, the distributions of fxj and i 
indicate that top scientists have lower levels of super-tie de- 
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- Top Biology — Other Bio. - Top Physics — Other Phys. 




FIG. 5: The frequency of super ties. Vertical lines indicate the dis¬ 
tribution mean. (A) Cumulative distribution of the fraction of 
the Si coauthors that are super ties. All pairwise comparisons of the 
distributions have K-S p-value greater than 0.21 indicating a com¬ 
mon underlying distribution P(/ij). (B) Cumulative distribution of 
the fraction fN,i of publications that include at least one super-tie 
coauthor. The top scientist distributions show mean values that are 
significantly smaller than their counterparts. (C) Cumulative distri¬ 
bution of the fraction fK,i of publications coauthored with his/her 
top collaborator. The mean and standard deviation for biology (top) 
is 0.15 ± 0.16, for biology (other) is 0.31 ± 0.16, for physics (top) 
is 0.17 ± 0.13, and for physics (other) is 0.38 ± 0.26. (D) The mean 
rate of super-ties per new collaboration, averaged over all 

the profiles in each dataset using observations aggregated over con¬ 
secutive 3-year periods. 


pendency than their counterparts. 

We also analyzed the arrival rate of super-ties. For each pro- 
hle we tracked the number of super ties initiated in year t, and 
normalized this number by the total number of new collabora¬ 
tions initiated in the same year. This ratio, estimates 

the likelihood that a new collaboration eventually becomes a 
super tie as a function of career age t. For example, using the 
set of collaborations initiated in each scientist’s hrst year, we 
estimate the likelihood that a hrst-year collaborator (mentor) 
becomes a super tie at Xfi{t = 1) = 8% (top bio.), 16% (other 
bio.), 14% (top phys.), and 15% (other phys.). Figure 5(D) 
shows the mean arrival rate, {Xii{t)), calculated by averag¬ 
ing over all prohles in each dataset. The super tie arrival rate 
declines across the career, reaching a 5% likelihood per new 
collaborator at f = 20 and 2.5% likelihood by t = 30. The 
decay is not as fast for the top-cited scientists, possibly re¬ 
flecting their preferential access to outstanding collaborators. 
However, the estimate for large t is biased toward smaller val¬ 
ues because collaborations initiated late in the career may not 
have had sufficient time to grow. 

In the next two subsections, we investigate the role of 
super ties at the micro level by analyzing productivity at the 
annual time resolution and the citation impact of individual 
publications. In the SI Text we provide additional evidence 
for the advantage of super ties by developing descriptive 
methods that measures the net productivity and citations of 
the super ties relative to all other ties. 


The Apostle effect I: Quantifying the impact of super ties 
on annual productivity. We analyzed each research prohle 
over the career years ti G [6, Mm(29, T^)], separating the 
data into non-overlapping A/-year periods, and neglecting the 
first 5 years to allow the (t) and (t) sufficient time to 
grow. We then modeled the dependent variable, 
which is the productivity aggregated over Af-year periods, 
normalized by the baseline average calculated over the period 
of analysis. Recent analysis of assistant and tenured profes¬ 
sors has shown that the annual publication rate is governed by 
slow but substantial growth across the career, with fluctuations 
that are largely related to collaboration size [24]. 

To better understand the factors contributing to productiv¬ 
ity growth, we include controls for career age t along with 
four additional variables measuring the composition of col¬ 
laborators from each Ai-year period. First, we calculated the 
average number of authors per publication, Oi t, a proxy for 
labor input, coordination costs, and the research technology 
level. Second, we calculated the mean duration, t, by aver¬ 
aging the Lij {t —At) values (from the previous period) across 
only the j who are active in / - i.e. those coauthors with 
AKij (t) > 0. In this way, we account for the possibility that 
j was not active in the previous period {t — At), in which case 
Lij {t — At) is even smaller than Lij (t) — At. Thus, t mea¬ 
sures the prior experience between i and his/her collaborators. 
Third, for the same set of coauthors as for Li t, we calculated 
the Gini index of the collaboration strength, Gfj, using the 
tie strength values up to the previous period, Kij{t — At). 
Thus, Gf( provides a standardized measure of the dispersion 
in coauthor activity, with values ranging from 0 (all coauthors 
published equally in the past with i) to 1 (extreme inequality 
in prior publication with i). Thus, while t measures the 
lifetime of the group’s prior collaborations, G^t measures the 
concentration of their prior experience. And Anally, for each 
period t, we calculated the contribution of super tie collabora¬ 
tors normalized by the contribution of all other collaborators. 


Pi,t — 


'l2j\R=o 


(5) 


accounting for the possibility that the relative contribution of 
super ties may affect productivity. While the total coauthor 
contribution J^j highly correlated with rii^t, the 

correlation coefficient between pi^t and t is only 0.07. We 
only include researchers in this analysis if there are > 4 data 
points for which the denominator of Eq. [5] is nonzero. 

We implemented a fixed effects regression of the model 

7—V = Pifi + Pa^xiait + jSjjLit+ 

{Ui) 

PcGft + PpPi,t + + ii,t , (6) 

which accounts for author-specific time-invariant features 
(Pi.o), using robust standard errors to account for autocorrela¬ 
tion within each i. Because the predictors are calculated from 
the same ego profile, covariance is expected; for example, the 
highest correlation coefficient between any two independent 
variables is 0.32 between Inoi t and Gfj, because the variance 
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in Kij increases proportional to the sample size (i.e. Ta¬ 
ble 1 shows the results of our model estimates for At = 1 year 
and Table SI shows the results for At = 3 years. We also ran 
the regression for all the datasets together,“All”, and provide 
standardized coefficients that better facilitate a comparison of 
the coefficient magnitudes. 

We observed a positive coefficient /3p = 0.11 ± 0.01 
(p < 0.003 for all datasets), meaning that larger contributions 
by super ties is associated with above-average productivity. 
By way of example, consider a scenario where the super ties 
contribute a third of the total coauthor input, corresponding to 
Pi^t = 0.5, the average j value we observed. Consider a 
second scenario with pi^t = 1, corresponding to equal input 
by the super ties and their counterparts {pi t > 1 for 14% of 
the observations). If all other parameters contribute a baseline 
productivity value 1, then the additional contribution from /3p 
corresponds to a 100 x 0.5/3p/(l -f 0.5/3p) = 5.2% productiv¬ 
ity increase. This value is consistent with the 5% productivity 
spillover observed in a study of star scientists [33]. 

We also found that periods corresponding to higher levels 
of prior experience are associated with below-average produc¬ 
tivity (/?^ < 0, p < 0.008 for all datasets except for top bi¬ 
ology). Despite the costs associated with tie-formation, this 
result demonstrates that productivity can benefit from collab¬ 
orator turnover. Nevertheless, above-average productivity is 
associated with higher inequality in the concentration of prior 
experience (/3 g > 0, p < 0.001 level for all datasets). To¬ 
gether, these results point to the benefits of strategically pair¬ 
ing new collaborators with incumbent ones in order to pro¬ 
mote the atypical combination of knowledge backgrounds and 
to achieve higher scientific impact [34]. In Table 1 we also re¬ 
port standardized coefficients that facilitate a comparison of 
the relative strengths of the model variables, revealing that pc 
is twice as strong as /3p and Pj^. Interestingly, Pp and Pj^ have 
opposite signs, yet are balanced in magnitude, suggesting a 
compensation strategy for group managers. 

The age coefficient Pt is also positive (p < 0.001 level for 
all datasets), consistent with patterns of steady productivity 
growth observed for successful research careers [5, 24, 31]. 
Possible explanatory variables to consider in extended anal¬ 
yses are the standard deviation in Kij, a contact frequency 
{Kij/Lij) measure of tie strength intensity per Granovetter’s 
original operationalization [10], and absolute calendar year y, 
variables which we omit here to keep the model streamlined. 

The Apostle effect II: Quantifying the impact of super ties 
on the long-term citation of individual puhlications. Deter¬ 
mining the impact of super ties on a publication’s long-term 
citation tally is difficult to measure, because clearly older pub¬ 
lications have had more time to accrue citations than newer 
ones - a type of censoring bias - and so a direct comparison 
of raw citations counts for publications from different years is 
technically flawed. To address this measurement problem, we 
map each publication’s citation count Ci^py{y) in census year 
li to a normalized z-score, 

^ lnG,p,y(p) - (lncg?(p)) 


This citation measure is well-suited for the comparison of 
publications from different y because Zi,p,y is measured rel¬ 
ative to the mean (Inc™(p)) number of citations by publica¬ 
tions from the same year y, in units of the standard deviation, 
CT[ln c™(p)] [31]. Thus, we take advantage of the fact that the 
distribution of citations obeys a universal log-normal distribu¬ 
tion for p from the same y and discipline [35]. In this way, z is 
defined such that the distribution P{z) is sufficiently time in¬ 
variant. To confirm this property, we aggregated Zi^p^y within 
successive 8-year periods, and calculated the conditional dis¬ 
tributions P{z\y), which are stable and approximately nor¬ 
mally distributed over the entire sample period {SI Text Fig. 
S5). 

To define the detrending indices (...) and tT[...] we use the 
baseline journal set m comprising all research articles col¬ 
lected from the journals Nature, Proceedings of the National 
Academy of Science, and Science. We use this aggregation 
of three multidisciplinary journals only to control for the time 
dependent feature of citation counts. We chose these jour¬ 
nals as our baseline because they have relatively large impact 
factors (high citation rates), and so the temporal information 
contained in (...) and cr[...] is less noisy than other m with 
lower citation rates. Furthermore, since most publications 
reach their peak citation rate within 5-10 years after publi¬ 
cation [5], we only analyze Zi^p^y with y < 2003. In this way, 
the Zi^p^y values we analyze are less sensitive to fluctuations 
early in the citation lifecycle, in addition to recent paradigm 
shifts in science such as the internet, which affects the search, 
the retrieval, and the citation of prior literature, and the rise of 
open-access publishing. 

In our regression model we use 5 explanatory variables 
which are author (*) and publication (p) specific. The first is 
the number of coauthors, Oi p, which controls for the tendency 
for publications with more coauthors to receive more citations 
[4]. This variable is also a gross level of technology and co¬ 
ordination costs, since larger teams typically reflect endeav¬ 
ors with higher technical challenge distributed across a wider 
range of skill sets. We use Ina^^p since the range of values is 
rather broad, appearing to be approximately log-normally dis¬ 
tributed in the right tail [7]. The second explanatory variable 
is the dummy variable Ri p which takes the value 1 if p in¬ 
cludes a super tie and the value 0 otherwise. Remarkably, the 
percentage of publications including a super tie is rather close 
to parity for three of the four datasets: 54% (top biology), 45% 
(top physics), 74% (other biology) and 54% (other physics). 
The third age variable f^p is the career age of i at the time of 
publication. The fourth variable Ni{tp) is the total number of 
publications up to year ti^p which is a non-citation-based mea¬ 
sure of the central author’s reputation, visibility, and experi¬ 
ence within the scientific community. The final explanatory 
variable is the collaboration radius, Sptp), which is the cu¬ 
mulative number of distinct coauthors up to f^p, representing 
the central author’s access to collaborative resources, as well 
as an estimate of the number of researchers in the local com¬ 
munity who, having published with i, may preferentially cite 
i. Hence, by including Nptp) and Sptp), we control for two 
dimensions of cumulative advantage that could potentially af¬ 
fect a publication’s citation tally. 
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TABLE I: Parameter estimates for the productivity model in Eq. (6) using Af = 1 year long periods, and the citation model in Eq. (8) 
using only the publications with j/p < 2003. Each fixed effects model was calculated using robust standard errors, implemented by the 
HuberAVhite/sandwich method. Values significant at the p < 0.04 level are indicated in boldface. “Std. coeff.” represents the estimates of the 
standardized (beta) coefficients.“All” corresponds to the combination of all datasets. 


Apostle effect I: productivity model (ni^t) 


Dataset 

A 

In at 

Lt 

Gf 

Pt 

t 

Nobs. 

Adj. i?2 

All 

466 

0.002 ± 0.029 

-0.054 ±0.008 

1.788 ±0.134 

0.110 ±0.013 

0.029 ±0.002 

8483 

0.19 

(Std. coeff.) 


0.002 ±0.033 

-0.140 ±0.021 

0.320 ±0.024 

0.140 ±0.016 

0.049 ±0.004 



p-value 


0.943 

0.000 

0.000 

0.000 

0.000 



Biology (top) 

99 

-0.123 ±0.056 

-0.011 ±0.018 

2.816 ±0.270 

0.111 ±0.026 

0.031 ± 0.003 

2202 

0.24 

p-value 


0.031 

0.519 

0.000 

0.000 

0.000 



Biology (other) 

95 

-0.061 ±0.056 

-0.067 ±0.025 

1.654 ±0.287 

0.071 ±0.023 

0.053 ±0.006 

1467 

0.29 

p-value 


0.275 

0.008 

0.000 

0.003 

0.000 



Physics (top) 

100 

-0.146 ±0.057 

-0.047 ±0.015 

2.053 ±0.287 

0.153 ±0.025 

0.022 ±0.004 

2056 

0.15 

p-value 


0.012 

0.002 

0.000 

0.000 

0.000 



Physics (other) 

172 

0.089 ±0.050 

-0.065 ±0.013 

1.495 ±0.213 

0.101 ±0.021 

0.026 ±0.005 

2758 

0.15 

p-value 


0.079 

0.000 

0.000 

0.000 

0.000 




Apostle effect II: citation model (zi^p) 


Dataset 

A 

In Op 

Rp 

tp 

In Ni{tp) 

InS'i(fp) 

Nobs. 

Adj. i?2 

All 

377 

0.263 ±0.024 

0.202 ±0.023 

-0.061 ± 0.004 

0.062 ± 0.066 

0.065 ± 0.072 

68589 

0.27 

(Std. coeff.) 


0.135 ±0.012 

0.129 ± 0.015 

-0.039 ±0.003 

0.044 ± 0.046 

0.050 ±0.055 



p-value 


0.000 

0.000 

0.000 

0.347 

0.367 



Biology (top) 

100 

0.263 ±0.039 

0.213 ± 0.033 

-0.029 ±0.007 

-0.138 ±0.102 

0.062 ±0.112 

22135 

0.12 

p-value 


0.000 

0.000 

0.000 

0.177 

0.578 



Biology (other) 

55 

0.579 ±0.053 

0.152 ±0.066 

-0.031 ± 0.015 

-0.179 ±0.095 

0.211 ± 0.094 

4801 

0.20 

p-value 


0.000 

0.026 

0.040 

0.065 

0.029 



Physics (top) 

100 

0.139 ±0.043 

0.230 ± 0.044 

-0.070 ±0.007 

0.277 ±0.118 

-0.119 ±0.135 

22673 

0.19 

p-value 


0.002 

0.000 

0.000 

0.021 

0.380 



Physics (other) 

122 

0.272 ±0.042 

0.235 ±0.049 

-0.060 ±0.008 

0.082 ± 0.095 

0.017 ±0.104 

18980 

0.19 

p-value 


0.000 

0.000 

0.000 

0.389 

0.870 




We then implement a fixed-effects regression to estimate 
the parameters of the citation impact model, 

— Pi,0 4“ -f PflRi^p Ptti^p 4“ 

l3N^nNi{tp) +/3slii-Si{tp) + et^p , ( 8 ) 

using the Huber/White/sandwich method to calculate robust 
standard error estimates that account for heteroskedasticity 
and within-panel serial correlation in the idiosyncratic error 
term Ci^p. We excluded publications with yp > 2003, and in 
order that the ‘top’ and ‘other’ datasets are well-balanced, we 
also excluded the ‘other’ researchers with less than 43 (bio) 
and 33 (phys.) publications (observations) as of 2003. Table 
1 lists the (standardized) parameter estimates. 

We estimated (4^ = 0.20 4i 0.02 (p < 0.026 level in each 
regression), indicating a significant relative citation increase 


when a publication is coauthored with at least one super tie. 
The standardized Pa and Pp. coefficients are roughly equal, 
meaning that increasing Op from 1 (a solo author publica¬ 
tion) to e « 3 coauthors produces roughly the same effect 
as a change in Rp from 0 to 1. Thus, while larger team size 
correlates with more citations [4], the relative strength of Pr 
stresses the importance of ‘who’ in addition to ‘how many’. 

Interestingly, the career age parameter Pt = —0.061 4i 
0.004 is negative (significant at the p < 0.04 level in each 
regression), meaning that researchers’ normalized citation im¬ 
pact decreases across the career, possibly due to finite career 
and knowledge life-cycles. This finding is consistent with a 
large-scale analysis of researcher histories within high-impact 
journals, which also shows a negative trend in the citation im¬ 
pact across the career [31]. Neither the reputation {Pn) nor 
collaboration radius (Ps) parameters were consistently statis- 
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tically significant in explaining Zi^p^y, likely because they are 
highly correlated with tp for established researchers. Modi¬ 
fications to consider in followup analysis are controls for the 
impact factor of the journal publishing p, the absolute year y 
in order to account for shifts in citation patterns in the post¬ 
internet era, and removing self-citations from super ties. Un¬ 
fortunately, this last task requires a substantial increase in data 
coverage, far beyond the relatively small amount needed to 
construct individual ego-network collaboration profiles. 

We develop three additional descriptive methods in the SI 
Text to compare the subset of publications with at least one 
super-tie to the complementary subset of publications with¬ 
out one. These investigations provide further evidence for the 
apostle effect. First, we defined an aggregate career measure, 
the productivity premium (see SI Text Eq. [SI]), which 
measures the average Kij value among the super ties relative 
to all the other collaborators. Second, we defined a similar ca¬ 
reer measure, the citation premium pc,i (see SI Text Eq. [S5]), 
which quantifies the average citation impact attributable to su¬ 
per ties relative to all the other collaborators. 

Independent of dataset, we observed rather substantial pre¬ 
mium values. Eor example, the productivity premium has an 
average value (pn) « 8, meaning that on a per-collaborator 
basis, productivity with super ties is roughly 8 times higher 
than the remaining collaborators. Similarly, the citation pre¬ 
mium pc,i is also significantly right-skewed, with average 
value (pc) « 14, meaning that net citation impact per super 
tie is 14 times larger than the net citation impact from all other 
collaborators. We emphasize that pc,i appropriately accounts 
for team size by using an equal partitioning of citation credit 
across the ap coauthors, remedying the multiplicity problem 
concerning citation credit. 

And third, we calculated an additional estimation of the 
publication-level citation advantage due to super ties. Eor 
both biology and physics, we found that the publications 
with super ties receive roughly 17% more citations than their 
counterparts. In basic terms, this means that the average pub¬ 
lication with a super tie has 21 more citations in biology and 8 
more citations in physics than the average publication without 
a super tie. This is not a tail effect, because the citation 
boost factor an = 1.17 applies a multiplicative shift to the 
entire citation distribution, P{c\Rp = 1) « P{anc\Rp — 0), 
thereby impacting publications above and below the average. 


Discussion 

The characteristic collaboration size in science has been 
steadily increasing over the last century [4, 7, 21] with con¬ 
sequences at every level of science, from education and aca¬ 
demic careers to universities and funding bodies [8]. Un¬ 
derstanding how this team-oriented paradigm shift affects the 
sustainability of careers, the efficiency of the science system, 
and society’s capacity to overcome grand challenges, will be 
of great importance to a broad range of scientific actors, from 
scientists to science policy makers. 


Collaborative activities are also fundamental to the career 
growth process, especially in disciplines where research ac¬ 
tivities require a division of labor. This is especially true in 
biology and physics research, where computational, theoret¬ 
ical, and experimental methods provide complementary ap¬ 
proaches to a wide array of problems. As a result, a con¬ 
temporary research group leader is likely to find the assem¬ 
bly of team - one which is composed of individuals with di¬ 
verse yet complementary skill sets - a daunting task, espe¬ 
cially when under constraints to optimize financial resources, 
valuable facilities, and other material resources. Online social 
network platforms, such as VIVO (http;//www.vivoweb.org/) 
and Profiles RNS (http;//profiles.catalyst.harvard.edu/), which 
serve as match-making recommendation systems, have been 
developed to facilitate the challenges of team assembly. 

Our analysis indicates that 2/3 of the collaborations ana¬ 
lyzed here are “weak”. Nevertheless, the remaining strong 
ties represent social capital investments that can indeed have 
important long-term implications, for example on information 
spreading [17], career paths [36], and access to key strategic 
resources [37]. In the private sector strong ties facilitate ac¬ 
cess to new growth opportunities, playing an important role in 
sustaining the competitiveness of firms and employees [38]. 
These considerations further identify why it is important for 
researchers to understand the opportunities that exist within 
their local network. Understanding the redundancies in the 
local network [39] and the interaction capacity of team mem¬ 
bers [25] can help a group leader optimize group intelligence 
[26] and monitor team efficiency [24], thereby constituting a 
source of strategic competitive advantage. 

In summary, we developed methods to better understand 
the diversity of collaboration strengths. We focused on the ca¬ 
reer as the unit of analysis, operationalized by using an ‘ego’ 
perspective so that collaborations, publications, and impact 
scores fit together into a temporal framework ideal for cross- 
sectional and longitudinal modeling. Analyzing more than 
166,000 collaborations, we found that a remarkable 60%-80% 
of the collaborations last only Lij = 1 year. Within the subset 
of repeat collaborations (L^ > 2 years), we find that roughly 
2/3 of these collaborations last less than a scientist’s average 
duration {Li) « 5 years, yet 1% last more than 4(Li) « 20 
years. This wide range in duration and the disparate frequen¬ 
cies of long and short Lij, together point to the dichotomy of 
burstiness and persistence in scientific collaboration. Closer 
inspection of individual career paths signals how idiosyncratic 
events, such as changing institutions or publishing a seminal 
study or book, can have significant downstream impact on the 
arrival rate of new collaboration opportunities and tie forma¬ 
tion (see Eigs. 1 and SI). Also, the frequency of relatively 
large publication overlap measures {fK,i and fN,i) indicates 
that career partners occur rather frequently in science. 

In the first part of the study we provide descriptive insights 
into basic questions such as how long are typical collabora¬ 
tions, how often does a scientist pair up with his/her main 
collaborator, and what is the characteristic half-life of a col¬ 
laboration. We also found that as the career progresses, re¬ 
searchers become attractors rather than pursuers of new col¬ 
laborations. This attractive potential can contribute to cumula- 
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live advantage [30, 31], as it provides select researchers access 
to a large source of collaborators, which can boost productiv¬ 
ity and increase the potential for a big discovery. 

We operationalized tie strength using an ego-centric per¬ 
spective of the collaboration network. Because the number of 
publications Kij between the central scientist i and a given 
coauthor j was found to be exponentially distributed, the 
mean value (Ki) is a natural author-specific threshold that 
distinguishes the strong (Kij < (Ki)) from the weak ties 
(Kij < (Ki)). Within the subset of strong ties we iden¬ 
tified ‘super tie’ outliers using an analytic extreme-statistics 
threshold iff defined in Eq. [4]. Also, because the number of 
publications produced by a collaboration is highly correlated 
with its duration, a super tie also represents persistence that is 
in excess of the stochastic churn rate that is characteristic of 
the scientific system. On a per-collaborator basis, the fraction 
of coauthors within a research profile that are super ties (Jr^) 
was remarkably common across datasets, indicating that super 
ties occur at an average rate of 1 in 25 collaborators. 

There are various candidate explanations for why such ex¬ 
tremely strong collaborations exist. Prosocial motivators may 
play a strong role, i.e. for some researchers doing science in 
close community may be more rewarding than going alone. 
Also, the search and formation of a compatible partnership 
requires time and other social capital investment, i.e. net¬ 
working. Hence, for two researchers who have found a col¬ 
laboration that leverages their complementarity, the potential 
benefits of improving on their match are likely outweighed 
by the long-term returns associated with their stable partner¬ 
ship. Complementarity, and the greater skill-set the part¬ 
nership brings, can also provide a competitive advantage by 
way of research agility, whereby a larger collective resource 
base can facilitate rapid adjustments to new and changing 
knowledge fronts, thereby balancing the risks associated with 
changing research direction. After all, a first-mover advantage 
can make a significant difference in a winner-takes-all credit 
& reward system [2]. 

Scientists may also strategically pair up in order to share 
costs, rewards, and risk across the career. In this light, an 
additional incentive to form super ties may be explained, in 
part, by the benefits of reward-sharing in the current scientific 
credit system, wherein publication and citation credit arising 
from a single publication are multiplied across the Op coau¬ 
thors in everyday practice. Considered in this way, the career 
risk associated with productivity lulls can be reduced if a close 
partnership is formed. For example, we observed a few ‘twin 
profiles’ characterized by a publication overlap fraction fK,i 
between the researcher and his/her top collaborator that was 
nearly 100%. Moreover, we found that 9% of the biologists 
and 20% of the physicists shared 50% or more of their pa¬ 
pers with their top collaborator. This highlights a particularly 
difficult challenge for science, which is to develop a credit 
system which appropriately divides the net credit, but at the 
same does not reduce the incentives for scientists to collabo¬ 
rate [8, 27-29]. Thus, it will be important to consider these 
relatively high levels of publication and citation overlap in the 
development of quantitative career evaluation measures, oth¬ 
erwise there is no penalty to discourage coauthor free-riding 


[7]. 

We concluded the analysis by implementing two fixed- 
effects regression models to determine the sign and strength 
of the ‘apostle effect’ represented by /3p (productivity) and (3r 
(citations). Together, these two coefficients address the funda¬ 
mental question: is there a measurable advantage associated 
with heavily investing in a select group of research partners? 

In the first model we measured the impact of super ties on 
a researcher’s annual publication rate, controlling for career 
age, average team size, the prior experience of i with his/her 
coauthors, and the relative contribution of super ties within 
year t as measured by t in Eq. [5]. We found larger pi^t to 
be associated with above-average productivity (/3p > 0), in¬ 
dicating that super ties play a crucial role in sustaining career 
growth. We also found increased levels of prior experience to 
be associated with decreased productivity (f5jj < 0), suggest¬ 
ing that maintaining redundant ties conflicts with the potential 
benefits from mixing new collaborators into the environment. 
Nevertheless, higher inequality in the concentration of prior 
experience was found to have a positive effect on productivity 
(/?G > 0). 

In the second regression model we analyzed the impact of 
super ties on the citation impact of individual publications, 
using the detrended citation measure Zi^p^y defined in Eq. [7]. 
This citation measure is normalized within publication year 
cohorts, thus allowing for a comparison of citation counts 
for research articles published in different years. We found 
that publications coauthored with super ties, corresponding 
to 52% of the papers we analyzed, have a significant increase 
in their long-term citations (Pr > 0). In the SI Text we 
provide additional evidence for the apostle effect, showing 
that publications with super ties receive 17% more citations. 
This added value may arise from the extra visibility the 
publications receives, since the super-tie collaborator may 
also contribute a substantial reputation and future productivity 
that promote the visibility of the publication. This type of 
network-mediated reputation spillover is corroborated by a 
recent study finding a significant citation boost attributable to 
a researcher’s centrality within the collaboration network [40]. 

Policy recommendations. In all, these results provide 
quantitative insights into the benefits associated with strong 
collaborative partnerships and the value of skill-set com¬ 
plementarity, social trust, and long-term commitment. This 
data-oriented analysis also contributes to the literature on the 
science of science policy [41], providing insight and guidance 
in an increasingly metrics-based evaluation system on how 
to account for individual achievement in team settings. One 
particularly relevant scenario is fellowship, tenure, and career 
award evaluations, where it is a common practice to consider 
“independence from one’s thesis advisor” as a selection 
criteria. We show that in order to assess a researcher’s 
independence, evaluation committees should also take into 
consideration the level of publication overlap between a 
researcher and his/her strongest collaborator(s). e.g. fxj 
and fN,i- Yet at the same time, the beneficial role of super 
ties - as we have quantitatively demonstrated - should also 
be acknowledged and supported. For example, funding 
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programs might consider career awards that are specifically 
multipolar [8], which would also benefit the research partners 
in academia who are actually life partners, and who may 
face the daunting “two-body problem” of coordinating two 
research careers. Furthermore, understanding the basic levels 
of publication overlap in science is also important for the 
ex post facto review of funding outcomes as a means to 
evaluate the efficiency of science. In large-team settings, 
measuring the efficiency of a laboratory or project is difficult 
without a better understanding of how to measure overlapping 
labor inputs (i.e., collaborator contributions) relative to the 
project outputs (e.g., publications, patents, etc.). Finally, 
our study informs early career researcherswho are likely to 
face important decisions concerning the (possibly strategic) 
selection of collaborative opportunitieson the positive impact 
that the right research partner can have on their careers 


long-term sustainability and growth. In all, our results 
provide quantitative insights into the benefits associated with 
strong collaborative partnerships, pointing to the added value 
derived from skill-set complementarity, social trust, and 
long-term commitment. 
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